Improved smoothed analysis of the k-means method
نویسندگان
چکیده
The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n √ k and σ−1. Second, we prove an upper bound of kkd · poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O( √ log n/ log log n). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.
منابع مشابه
Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کامل2 00 9 k - Means has Polynomial Smoothed Complexity
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are u...
متن کاملCombination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملNumerical Investigation of Vertical and Horizontal Baffle Effects on Liquid Sloshing in a Rectangular Tank Using an Improved Incompressible Smoothed Particle Hydrodynamics Method
Liquid sloshing is a common phenomenon in the transporting of liquid tanks. Liquid waves lead to fluctuating forces on the tank wall. If these fluctuations are not predicted or controlled, they can lead to large forces and momentum. Baffles can control liquid sloshing fluctuations. One numerical method, widely used to model the liquid sloshing phenomena is Smoothed Particle Hydrodynamics (SPH)....
متن کاملWorst-Case and Smoothed Analysis of the k-Means Method with Bregman Divergences
The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, k-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009